Model Invocation for Three Dimensional Scene Understanding
نویسنده
چکیده
Any gantral model-based vision system must somehow se lect a few serious candidates from its model base before ap plying model-directed processing. This is necessary for both efficiency and recognising 'similar* models (i.e. handling data errors, generic models and previously unseen objects). This pa per shows how one can Integrate knowledge of object properties, structural and generic relations to create a network computa t ion that performs model invocation. The paper demonstrates successful invocation in a scene containing a self and externally obscured P U M A robot. 1 In t r o d net ion One important and difficult task for a general model based vision sys tem is invoking the correct model. Because of the potentially huge number of possible objects, it is imperative that only a few serious candidates are selected for detailed consideration. Visual understand ing must also include a pre-attentive element, because all models need be considered, yet active, direct comparison is computationally infeasible. Further, previously unseen objects, flexible objects seen in new configurations, incompletely visible objects (e.g. occlusion) and object variants (e.g. flaws, generics, new exemplars, etc.) require selecting models that are "close" to the data. Model invocation is not just a visual problem (e.g. integrating cues while doing crossword pussies or invoking situation schemas), but here only the visual problem is considered. Invocation associates clues that suggest rather than verify. It may support the "seeing* of nonexistent, but highly plausible objects, as in surrealist art. To date, little work has been done on sophisticated model invo cation in the context of 3D vision. Using easily measured properties to select potential models fails in large model bases, because many objects share similar properties. Further, data errors, generic objects, object substructure and occlusion complicate indexing. Arbib [l] proposed a schema-based invocation process with acti vation levels based on evidence, competition and cooperation from related activity. Marr [6) considered direct search in a model-base linked using specificity, adjunct and parent relations. Hinton and Lang [5] evaluated a connectionist model of invocation for 2D models, treating both model and data feature evidence identically. Feldman and Ballard [2] proposed a detailed computational model integrating evidence from spatially coincident property pairings. This paper describes a solution that builds on these, embodying ideas on parallel networks, object description and representation in the context of 3D models and 3D visual information. The result is a plausibility calculation in a network structured according constraints defined by the structural, generic and context relationships. Acknowledgements Portions of this work were supported by a University of Edinburgh studentship. Thanks go to R. Beattie, J. Hallam, D. Hogg, J. Howe, M. Orr and many others. 2 Problem Context These results are from the IMAGINE project [3] which investigated recognising 3D objects starting from 3D scene information. Earlier stages of processing include: 1. exploiting 3D feature continuity to overcome occlusion, 2. grouping individual surfaces to form primitive and depth aggre gated surface clusters [4]. 3. describing the significant scene features (curves, surfaces and volumes) by their 3D properties. Model invocation happens at this point. After invocation, modeldirected processes orient and verify the hypotheses. Recognition starts from 2 1/2D sketch -like data segmented into sur face patches of nearly uniform shape and separated by various shape or obscuring boundaries. As no well-developed processes produce this data yet, the program input is from computer augmented, handsegmented test images. The paper illustrates the invocation process using a test image of a PUMA robot with its gripper obscured, using the surfaces shown in figure 1. The scene has flexibly connected rigid solids, has a variety of curved surfaces and the robot is both externally and self-obscured. Object models are primarily structural, with features attached by reference frame transformations. The main primitives are the sur face patch (SURFACE), characterised by consistent surface shape and polycurve boundary, and hierarchical groupings of surfaces (ASSEM BLY). Four additional representations are added for invocation: 1. generic relationships between models, 2. major features of each assembly grouped according to viewpoint, 3. relevant properties with typical values for each structure, and 4. weighting factors modifying the importances of the properties and relationships. Figure 1: Test Scene Surface Regions
منابع مشابه
Developing 3 dimensional model for estimation of acoustic power in urban pathways in geo-spatial information system framework
Around the word, traffic growth is causing growing air and noise pollution. Noise levels in a given area are affected by traffic on the streets as well as effective factors, including existing infrastructure and industrial centers, and so on. The purpose of this research is to model and estimate the amount of acoustic emission in the streets of Tehran's third district, using the 3D spatial info...
متن کامل3D Scene and Object Classification Based on Information Complexity of Depth Data
In this paper the problem of 3D scene and object classification from depth data is addressed. In contrast to high-dimensional feature-based representation, the depth data is described in a low dimensional space. In order to remedy the curse of dimensionality problem, the depth data is described by a sparse model over a learned dictionary. Exploiting the algorithmic information theory, a new def...
متن کاملZhile Ren | Research Statement
Figure 1: COG descriptor encodes orientation-invariant gradient feature for objects with different views. I develop new representations and algorithms for three-dimensional (3D) scene understanding from cluttered indoor RGB-D images and outdoor video sequences. I introduce novel representations for 3D object detection systems that localize objects with cuboids and describe room layouts by Manha...
متن کاملRepresentation and incremental construction of a three-dimensional scene model
The representation, construction, and updating of the 3D scene model derived by the 3D Mosaic scene understanding system is described. The scene model is a surface-based description of an urban scene, and is incrementally acquired from a sequence of images obtained from multiple viewpoints. Each view of the scene undergoes analysis which results in a 3D wire-frame description that represents po...
متن کاملThree dimensional interpretation of an indoor scene from a single image
One of the main goals of computer vision has been understanding the overall scene and the various components in the scene from an image. In order to guarantee that the resulting output is valid in three dimensions, three dimensional reasoning must be done. My previous work on reconstruction of indoor building structures and detecting cuboid and rectangular objects have made use of strict three ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1987